Show&tell: a semi-automated image annotation system - Multimedia, IEEE

نویسندگان

  • K. Srihari
  • Zhongfei Zhang
چکیده

A multimedia system for semi-automated image annotation, Show&Tell combines advances in speech recognition, natural language processing, and image understanding. Show&Tell differs from map annotation systems and has tremendous implications for situations where visual data must be coreferenced with text descriptions, such as medical image annotation and consumer photo annotation. S how&Tell takes advantage of advances in speech technology and natural language/image understanding research to make the preparation of image-related information more efficient. Specifically, we aim to identify relevant objects and regions in the image, as well as to attach text descriptions to them. We use a combination of automated and semi-automated image understanding tools in object and region identification. Image analysts can use Show&Tell in applications where text descriptions must be coreferenced with image areas, such as medical image annotation, National Aeronautics and Space Administration (NASA) space photo annotation, and even consumer photo annotation. Medical images suit our system well, since radiologists already employ speech to dictate their findings and robust image understanding technology is available for several areas, such as chest and lung radiographs. In a joint effort with Kodak, we are adapting our system for consumer photo annotation. Since still cameras can be fitted with microphones, speech annotation of photos is now possible. Consumers will be able to easily create searchable digital photo libraries of their pictures and focus primarily on pictures of people in various contexts. Multimedia input analysis Multimedia systems involving speech and deictic input can be classified into two major categories: multimedia input analysis and multimedia presentation. Our work focuses on the former. Our system differs from previous work in the area of adding text annotations to pictorial data in the following ways. Most systems assume that there already exists an underlying semantic representation of the pictorial data. We don’t. We used Clark’s terminology.1 The region to which the user points is the demonstratum, the descriptive part of the accompanying text is the descriptor, and the region to which the user intends to refer is the referent. Much of the recent work in multimedia input analysis concerns disambiguating ambiguous deictic references, that is, determining which of the possible referents that map to the same demonstratum is intended by the user.2 Accompanying linguistic input, in the form of speech, is used for this purpose. Such systems assume that the type of deixis being used, known as demonstratio ad oculus, is distinguished by the fact that the objects on display have already been introduced and that the user and the system share a common visual field. For example, in the case of maps, a graph represents the semantics. Thus, deictic references using a mouse or other pointing devices can be associated with the underlying geographical entity (or relationship). Our work pertains to situations where the image hasn’t been subjected to any previous semantic interpretation. Thus, when a user clicks on a building and supplies a name for it, the system doesn’t initially know about the building. The mouse click could correspond to a single pixel, a region of pixels, or the entire image. In fact, one goal of our work is to examine the usefulness of text descriptions in image interpretation. Here, we focus on the design issues of developing an efficient easy-to-use system. The major bottleneck in such situations occurs in image interpretation. At one extreme, a highly automated system would first perform image interpretation without user input. In this case, the user could subsequently attach text annotations to automatically detected objects and regions. This would make the annotation task similar to the map annotation systems previously mentioned and permit nonexperts to use it. Recent approaches by the Defense Advanced Research Projects Agency (DARPA) image understanding community (Radius program3) have targeted developing robust image understanding Show&Tell: A Semi-Automated Image Annotation System

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Show&Tell: A Semi-Automated Image Annotation System

A multimedia system for semi-automated image annotation, Show&Tell combines advances in speech recognition, natural language processing, and image understanding. Show&Tell differs from map annotation systems and has tremendous implications for situations where visual data must be coreferenced with text descriptions, such as medical image annotation and consumer photo annotation. S how&Tell take...

متن کامل

Show and Tell: Using Speech Input for Image Interpretation and Annotation

This research concerns the exploitation of linguistic context in vision. Linguistic context is qualitative in nature and is obtained dynamically. We view this as a new paradigm which is a golden mean between data driven object detection and site-model based vision. Our solution not only proposes new techniques for using qualitative contextual information, but also efficiently exploits existing ...

متن کامل

A semi-automated Framework for Supporting Semantic Image Annotation

Advanced semantic description of multimedia data significantly improves representing, labeling, and retrieving multimedia-based contents. In this paper we present an intelligent framework for attaching semantic annotations to image contents based on the extraction of elementary low-level features, user’s relevance feedback and the usage of ontology knowledge. This approach facilitates image ann...

متن کامل

Guide to Annotation

A review of multimedia annotation techniques, in particular image annotation, is presented. The annotation requirements for the Benchmarking workpackage of the MUSCLE EU Network of Excellence are also presented and discussed. A significant contribution is the creation of a keyword vocabulary based on an analysis of keywords used in experiments for testing automated image annotation algorithms a...

متن کامل

Fuzzy Neighbor Voting for Automatic Image Annotation

With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000